ABSTRACT

Sign language recognition is the use of computer technology to convert sign language into text or speech to facilitate the communication between deaf-mute people and normal people. This paper takes Chinese sign language words as the research object, and proposes a new method of sign language recognition based on two-stream 3D-CNN and LSTM network. First, the key frame extraction algorithm is used to remove redundant data frames in the original data, and then a two-stream3D-CNN is used to learn local hand change features and global trajectory features at the same time, and aggregated as the feature input of the video clip to the LSTM codec network. In order to focus on the video frames that express the meaning of sign language, a time attention mechanism is introduced in the LSTM encoding and decoding network. On the DEVISIGN-D sign language data set, an experiment was compared with three sign language recognition algorithms, the experimental results show that the method can identify Chinese isolated words sign language very well, with an accuracy rate of 98.4%.

Keywords: - 3D Convolutional Neural Network, Attention mechanism,Long and Short-Term Memory Network, Sign language recognition,Key frame